feat: Add Gemma3 chat handler (#1976) #1989

kossum · 2025-03-30T19:54:40Z

Added gemma3 chat handler, and fixed the image embedding, supports multiple images.

Included llamacpp functions and structures:

clip_image_load_from_bytes
clip_image_batch_encode
clip_image_preprocess
clip_image_f32_batch_free
clip_image_u8_batch_free
clip_image_f32_free
clip_image_u8_free
clip_image_u8_init
struct clip_image_f32_batch
struct clip_image_u8_batch

Usage:

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Gemma3ChatHandler

chat_handler = Gemma3ChatHandler(clip_model_path="path/to/mmproj")
llama = Llama(
  model_path="path/to/model",
  chat_handler=chat_handler,
  n_ctx=1024,  # n_ctx should be increased to accommodate the image embedding
)

messages = [
  {
    'role': 'user',
    'content': [
      {'type': 'text', 'text': 'Please describe this image'},
      {'type': 'image', 'url': 'https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/tests/fixtures/tests_samples/COCO/000000039769.png'},
    ]
  }
]

output = llama.create_chat_completion(
  messages,
  stop=['<end_of_turn>', '<eos>'],
  max_tokens=200,
)

print(output['choices'][0]['message']['content'])

Test Results:

Passed local environment tests: Python 3.12, unsloth/gemma-3-4b-it-GGUF, unsloth/gemma-3-12b-it-GGUF, unsloth/gemma-3-27b-it-GGUF, bartowski/google_gemma-3-12b-it-GGUF

Compatibility:

Fully backward compatible with existing interfaces.
Maintains original APIs while adding new options and interfaces.

…in Gemma3ChatHandler

RuurdBijlsma · 2025-04-03T20:58:10Z

i've been using it a bit it works nicely, had to find out the message structure but maybe that's normal for different chat handlers. I'm not that familiar with llama-cpp

"type": "image",
"image": {
    "url": "https://image.com/img.jpg",
}

i was used to "image_url" for both places "image_url" is used now.

dchatel · 2025-04-04T07:16:10Z

How would that work with a local image?

kossum · 2025-04-04T11:31:26Z

Sorry, i didn't modify the origin chat template of gemma3 and then used "type": "image". Now i have changed the format of the messages to be compatible with the openai api, just like other chat handlers.

Here is a full example:

from pathlib import Path
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Gemma3ChatHandler

def image_to_base64_uri(image: bytes | str):
  import base64
  import urllib.request as request

  if isinstance(image, bytes):
    data = base64.b64encode(image).decode('utf-8')
  else:
    with request.urlopen(image) as f:
      data = base64.b64encode(f.read()).decode('utf-8')
  return f'data:image/png;base64,{data}'

chat_handler = Gemma3ChatHandler(clip_model_path='path/to/mmproj')
llama = Llama(
    model_path='path/to/model',
    chat_handler=chat_handler,
    n_ctx=2048,  # n_ctx should be increased to accommodate the image embedding
)

messages = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'please compare these pictures'},
            {'type': 'image_url', 'image_url': 'https://xxxx/img1.jpg'},
            {'type': 'image_url', 'image_url': {'url': 'https://xxxx/img2.png'}},
            {'type': 'image_url', 'image_url': image_to_base64_uri(Path('path/to/img3.jpg').read_bytes())},
            {'type': 'image_url', 'image_url': {'url': image_to_base64_uri(Path('path/to/img4.png').read_bytes())}},
            {'type': 'text', 'text': 'and then tell me which one looks the best'},
        ]
    }
]

output = llama.create_chat_completion(
    messages,
    stop=['<end_of_turn>', '<eos>'],
    max_tokens=500,
    stream=True,
)

for chunk in output:
  delta = chunk['choices'][0]['delta']
  if 'role' in delta:
    print(delta['role'], end=':\n')
  elif 'content' in delta:
    print(delta['content'], end='')

llama._sampler.close()
llama.close()

joaojhgs · 2025-04-15T00:59:01Z

bump on this, thanks for your work! gemma3 is a great model to have support to, I'm waiting on it!

joaojhgs · 2025-04-15T13:05:01Z

Hey @kossum just wondering, does this handler support function calling? I ask because the handler for llava1.5 does support multimodal (vision) and also tool calling at once, as Gemma3 also has tool calling capabilities, it would be great to add both into a single handler!

kossum · 2025-04-16T11:23:54Z

Hello @joaojhgs, gemma3 (especially the 12b and 27b versions) has strong instruction-following abilities and can generate structured function call outputs through well-designed prompts.

But unlike gpt4 or claude, gemma3 does not have builtin support for tool call tokens or json schema enforcement. That means:

No builtin tool use markers: gemma3 does not automatically identify or tag tool usage.
Requires explicit prompt design: you need to clearly define function names, parameters, and output format in the prompt.
Lacks standardized templates: currently, gemma3’s chat_template does not include tool use structures.

So to implement function calling with gemma3, you must rely on carefully designed prompts to guide the model in producing the correct format.

Simple example:

import json
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Gemma3ChatHandler

chat_handler = Gemma3ChatHandler(clip_model_path='path/to/mmproj')
llama = Llama(
    model_path='path/to/model',
    chat_handler=chat_handler,
    n_ctx=2048,  # n_ctx should be increased to accommodate the image embedding
)


def analyze_image(image_id: str, description: str):
  print('image_id:', image_cache.get(image_id))
  print('description:', description)
  ...


image_cache = {'img_01': 'https://xxxx/img_01.jpg'}
function_table = {'analyze_image': analyze_image}

# input arg1
image_id = 'img_01'
# input arg2
question = f'Here is the image with ID `img_01`. Please analyze it.'

output = llama.create_chat_completion(
    [
        {
            'role': 'system',
            'content': '''You can call the following function:
- analyze_image(image_id: str, description: str)

You will be shown an image. First, analyze and describe its content in detail.
Then, return a function call with:
- the assigned image_id (provided in the input)
- a description of what the image shows (your own analysis)

Respond only with a JSON (without code blocks) function call like:
{
  "function": "analyze_image",
  "arguments": {
    "image_id": "<image id>",
    "description": "<description of the image>"
  }
}
'''
        },
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': question},
                {'type': 'image_url', 'image_url': image_cache[image_id]},
            ]
        }
    ],
    stop=['<end_of_turn>', '<eos>'],
    max_tokens=500,
)

data = json.loads(output['choices'][0]['message']['content'])
result = function_table[data['function']](**data['arguments'])
...

Naturally, if multimodal capabilities aren’t needed, this chat handler can be omitted.

joaojhgs · 2025-04-16T22:34:55Z

Hello @joaojhgs, gemma3 (especially the 12b and 27b versions) has strong instruction-following abilities and can generate structured function call outputs through well-designed prompts.

But unlike gpt4 or claude, gemma3 does not have builtin support for tool call tokens or json schema enforcement.

Thanks, I didn't know about that!

kossum added 2 commits March 31, 2025 04:15

feat: Add Gemma3 chat handler (abetlen#1976)

f33dde3

resolve the image embedding issue in gemma3

25b2f8f

kossum mentioned this pull request Apr 2, 2025

Gemma3 Multimodal #1976

Open

fix: added n_ctx check for prompt requirements when embedding images …

1b45588

…in Gemma3ChatHandler

fix: modify the gemma3 chat template to be compatible with openai api

025e7fa

kossum force-pushed the main branch from 0a144b3 to 025e7fa Compare April 12, 2025 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Gemma3 chat handler (#1976) #1989

feat: Add Gemma3 chat handler (#1976) #1989

kossum commented Mar 30, 2025 •

edited

Loading

RuurdBijlsma commented Apr 3, 2025 •

edited

Loading

dchatel commented Apr 4, 2025

kossum commented Apr 4, 2025 •

edited

Loading

joaojhgs commented Apr 15, 2025

joaojhgs commented Apr 15, 2025

kossum commented Apr 16, 2025 •

edited

Loading

joaojhgs commented Apr 16, 2025

feat: Add Gemma3 chat handler (#1976) #1989

Are you sure you want to change the base?

feat: Add Gemma3 chat handler (#1976) #1989

Conversation

kossum commented Mar 30, 2025 • edited Loading

RuurdBijlsma commented Apr 3, 2025 • edited Loading

dchatel commented Apr 4, 2025

kossum commented Apr 4, 2025 • edited Loading

joaojhgs commented Apr 15, 2025

joaojhgs commented Apr 15, 2025

kossum commented Apr 16, 2025 • edited Loading

joaojhgs commented Apr 16, 2025

kossum commented Mar 30, 2025 •

edited

Loading

RuurdBijlsma commented Apr 3, 2025 •

edited

Loading

kossum commented Apr 4, 2025 •

edited

Loading

kossum commented Apr 16, 2025 •

edited

Loading